Welcome back to deep learning. So today I want to talk to you about the actual pooling implementation.
The pooling layers are one essential step in many deep networks. The main idea behind this
is that you want to reduce the dimensionality across the spatial domain. So here we see this
small example where we summarize the information in the green rectangles, the blue rectangles,
the yellow and the red ones to only one value. So we have a two by two input that has to be mapped
to a single value. Now this of course reduces the number of parameters. It introduces a hierarchy
and allows you to work with spatial abstraction. Furthermore it reduces computational cost and
overfitting. We need some basic assumptions of course. Here and of course one of the assumptions
is that the features are hierarchically structured. By pooling we are reducing the output size
and introduce this hierarchy that should be intrinsically present in the signal.
We talked about the eyes being composed of edges and lines and faces a composition of eyes and mouth.
This has to be present in order to make pooling a sensible operation to be included in your network.
Here you see a pooling layer of three by three and we choose max pooling. So in max pooling only the
highest number of a receptor field will actually be propagated into the output. Obviously we can
also work with larger strides. Typically the stride equals the neighborhood size such that we can get
one output per receptive field. The problem here is of course that the maximum operation
adds an additional non-linearity and therefore we also have to think about how to resolve this step
in the gradient procedure. Essentially we use again the concept of the subgradient where we simply
propagate into the cell that has produced the maximum output. So you could say the winner takes
it all. Now an alternative to this is average pooling. Here we compute simply the average of
in the neighborhood. However it does not consistently perform better than max pooling. In the back
propagation paths the error is simply shared in equal parts and back propagated to the respective
units. There are many more pooling strategies like fractional max pooling, LP pooling, stochastic
pooling, spatial pyramid pooling, generalized pooling and many more. There's a whole different
set of strategies about this. Two alternatives that we already talked about are the strided and
atrocious convolutions. This became really popular because then you don't have to encode the max
pooling as an additional step and you reduce the number of parameters. Typically people now use
strided convolutions with s greater than one in order to implement convolution and pooling at the
same time. So let's recap what our convolutional neural networks are doing. We talked about the
convolution producing feature maps and the pooling reducing the size of the respective feature maps.
Then again convolutions and pooling until we end up at an abstract representation. Finally we had
these fully connected layers in order to do the classification. Actually we can kick out this last
block because we've seen that if we replace this with a reformatting into channel direction then
we can replace it with a one by one convolution. Subsequently we just apply this to get our final
classification. Hence we can reduce the number of building blocks further. We don't even need
fully connected layers anymore. Now everything then becomes fully convolutional and we can express
essentially the entire chain of operations by convolutions and pooling steps. So we don't even
need fully connected layers anymore. The nice thing about the one by one convolutions is if you
combine this with something that is called global average pooling then you can essentially also
process input images of arbitrary size. So the idea here is then at the end of the convolutional
processing you simply map into the channel direction and compute the global average for
all of your inputs. This works because you have a predefined global pooling operation.
Then you can make this applicable to images of arbitrary sizes. So again we benefit from the
ideas of pooling and convolution. An interesting concept that we will also have a look at in more
detail later in this lecture is the inception model. This approach is from the paper going deeper
with convolutions reference 8. Following our self-stated motto we need to go deeper.
This network won the ImageNet challenge 2014. An example is GoogleNet as one incarnation which
is inspired by reference number 4. The idea that they presented tackles the problem of having to
fix the steps of convolution and pooling in alteration. Why not allow the network to learn
on its own when it wants to pool and when it wants to convolve. The idea is that the network
Presenters
Zugänglich über
Offener Zugang
Dauer
00:08:45 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 13:56:29
Sprache
en-US
Deep Learning - Activations, Convolutions, and Pooling Part 4
This video presents max and average pooling, introduces the concept of fully convolutional networks, and hints on how this is used to build deep networks.
For reminders to watch the new video follow on Twitter or LinkedIn.
References:
[1] I. J. Goodfellow, D. Warde-Farley, M. Mirza, et al. “Maxout Networks”. In: ArXiv e-prints (Feb. 2013). arXiv: 1302.4389 [stat.ML].
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”. In: CoRR abs/1502.01852 (2015). arXiv: 1502.01852.
[3] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, et al. “Self-Normalizing Neural Networks”. In: Advances in Neural Information Processing Systems (NIPS). Vol. abs/1706.02515. 2017. arXiv: 1706.02515.
[4] Min Lin, Qiang Chen, and Shuicheng Yan. “Network In Network”. In: CoRR abs/1312.4400 (2013). arXiv: 1312.4400.
[5] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. “Rectifier Nonlinearities Improve Neural Network Acoustic Models”. In: Proc. ICML. Vol. 30. 1. 2013.
[6] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. “Searching for Activation Functions”. In: CoRR abs/1710.05941 (2017). arXiv: 1710.05941.
[7] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning”. In: arXiv preprint arXiv:1702.03118 (2017).
[8] Christian Szegedy, Wei Liu, Yangqing Jia, et al. “Going Deeper with Convolutions”. In: CoRR abs/1409.4842 (2014). arXiv: 1409.4842.
Further Reading:
A gentle Introduction to Deep Learning